Recurrent Neural Net Learning and Vanishing Gradient

نویسنده

Sepp Hochreiter

چکیده

Recurrent nets are in principle capable to store past inputs to produce the currently desired output. This recurrent net property is used in time series prediction and process control. Practical applications involve temporal dependencies spanning many time steps between relevant inputs and desired outputs. In this case, however, gradient descent learning methods take to much time. The learning time problem appears because the error vanishes as it gets propagated back. The decaying error ow is theoretically analyzed. Then methods trying to overcome vanishing gradient are mentioned. Finally, experiments comparing conventional algorithms and alternative methods are presented. Experiments using advanced methods show that learning long time lags problems can be done in reasonable time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Longer Memory in Recurrent Neural Networks

Recurrent neural network is a powerful model that learns temporal patterns in sequential data. For a long time, it was believed that recurrent networks are difficult to train using simple optimizers, such as stochastic gradient descent, due to the so-called vanishing gradient problem. In this paper, we show that learning longer term patterns in real data, such as in natural language, is perfect...

متن کامل

Overcoming the vanishing gradient problem in plain recurrent networks

Plain recurrent networks greatly suffer from the vanishing gradient problem while Gated Neural Networks (GNNs) such as Long-short Term Memory (LSTM) and Gated Recurrent Unit (GRU) deliver promising results in many sequence learning tasks through sophisticated network designs. This paper shows how we can address this problem in a plain recurrent network by analyzing the gating mechanisms in GNNs...

متن کامل

Learning Multiple Timescales in Recurrent Neural Networks

Recurrent Neural Networks (RNNs) are powerful architectures for sequence learning. Recent advances on the vanishing gradient problem have led to improved results and an increased research interest. Among recent proposals are architectural innovations that allow the emergence of multiple timescales during training. This paper explores a number of architectures for sequence generation and predict...

متن کامل

The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions

Received () Revised () Recurrent nets are in principle capable to store past inputs to produce the currently desired output. Because of this property recurrent nets are used in time series prediction and process control. Practical applications involve temporal dependencies spanning many time steps, e.g. between relevant inputs and desired outputs. In this case, however, gradient based learning ...

متن کامل